S/N 10/542,209 
Response to Office Action of April 9, 2008 



AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior listings and versions of claims in the 
application: 

Listing of Claims: 

1 . (Currently Amended) A method for performing a supervised learning process in 
an artificial intelligence environment including optimizing a database of sample records for the 
training and testing of a prediction algorithm for predicting the presence or absence of a 
specified medical condition in a patient, the method comprising the steps of: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or more 
prediction algorithms and assigning a fitness score to each, each of said prediction algorithms 
being associated with a certain distribution of said database records; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates a 
set of one or more second generation prediction algorithms and assigns a fitness score to each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs, wh e re wherein said termination event is at least one 
of: 

a prediction algorithm [[is]] generated with a fitness score equal to or exceeding a 
defined minimum value, 

the maximum fitness score of successive generational sets of prediction 
algorithms converging to a given value, and 

a certain number of generations having been generated; 

selecting a prediction algorithm having a best fitness score; and 

using the distribution of database records associated with said selected prediction 
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algorithm in performing supervised learning, said supervised learning including training and 
testing of prediction algorithms to obtain a trained prediction algorithm, wherein 

said method is performed using a computer and computer software forming an 
intelligent system, and 

the trained prediction algorithm is effective to predict output variables for data 
relating to said condition, thereby predicting diagnosis of said condition, 

and further comprising the steps of: 

generating a population of prediction algorithms, whoro wherein each one of said 
prediction algorithms is trained and tested according to a different distribution of the records of 
the data set in the complete database onto a training data set and a testing data set, 

each different distribution being created as one of a random distribution and a 
distribution formed by a deterministic mathematical process characterized as a 
pseudorandom distribution, 

each prediction algorithm of [[the]] said population being trained according to its 
own distribution of records of the training set and [[is]] being validated in a blind way 
according its own distribution on the testing set, and 

a score reached by each prediction algorithm being calculated in the testing 
phase representing its fitness; 

providing an evolutionary algorithm which combines the different models of distribution 
of the records of the complete data set in a training and in a testing set,, which sets are 
represented each one by a corresponding prediction algorithm trained and tested on the basis 
of [[the]] said training and testing data set according to the fitness score calculated in the 
previous step for the corresponding prediction algorithm, 

the fitness score of each prediction algorithm corresponding to one of the 
different distributions of the complete data set on the training and the testing data sets 
being the probability of evolution of each prediction algorithm or of each said distribution 
of the complete data set on the training and testing data sets; 

repeating the evolution of the prediction algorithm generation for a finite number of 
generations or till the output of the genetic algorithm converges to a best solution and/or till the 
fitness value of at least some prediction algorithm related to an associated data records 
distribution has reached a desired value; and 
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setting the data records distribution for the best solution as the optimized training and 
testing subsets for training and testing prediction algorithm. 

2. (Cancelled) 

3. (Currently Amended) A method according to claim charact e r i s e d i n 
tha twherein to each record of the data set a distribution variable is associated which is binary 
and has at least two status, one of this two status being associated with the inclusion of the 
record in the training set and the other in the testing set. 

4. (Currently Amended) A method according to claim 1, charact e r i s e d that w herein 
the prediction algorithm is an artificial neural network. 

5. (Currently Amended) A method according to claim 1 , charact e r i s e d i n 
tha twherein the prediction algorithm is a classification algorithm. 

6. (Currently Amended) A method according to claim \^ charact e r i s e d i n 
tha twherein once an optimum distribution has been computed, the opt i m i s e d optimized training 
data subset is made equal to a complete data set being the individuals included in the training 
subset distributed onto a new training set and onto a new testing set each one having about the 
half of the records of the original optimized training set, while the originally optimized testing set 
is used as a third data subset for validation purposes. 

7. (Currently Amended) A method according to claim Q L charact e r i s e d i n 
tha twherein the distribution of the data of the originally optimized training set onto the new 
training and new testing set is optimized by means of a pre-processing phase including the 
steps of said method for optimizing a database of sample records, said records being records in 
the originally optimized training set. 

8. (Currently Amended) A method according to claim 1, i n wh i ch wherein different 
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choices of the structure of the training subset and the structure of the testing subset consist in 
different selections of the number of input variables of the data records of the database, which 
selections consist in leaving out at least one, preferably two or more variables from the entire 
input variable set forming each record, the records of the database comprising a certain number 
of known input variables and a certain number of known output variables. 

9. (Currently Amended) A method according to claim 8, charactor i sod by further 
comoprisinq the following steps: 

defining a distribution of data from the complete data set onto a training data set and 
onto a testing data set; 

generating a population of different prediction algorithms each one having a training 
and/or testing data set in which only some variables have been considered among all the 
original variables provided in the data sets, each one of the prediction algorithms being 
generated by means of a different selection of variables; 

carrying out learning and testing of each prediction algorithm of the population and 
evaluating the fitness score of each prediction algorithm; 

applying an evolutionary algorithm to the population of prediction algorithms for 
achieving new generations of prediction algorithm; 

for each generation of new prediction algorithms representing each one a new different 
selection of input variable, testing or validating the best prediction algorithm according to the 
best hypothesis of input variables selection is t e sted or va li dat e d ; and 

evaluating a fitness score i s e va l uated and promoting the prediction algorithms,, 
representing the selections of input variables which have the best testing performances and the 
minimum input variables! ar e promot e d for the processing of the new generations. 

10. (Currently Amended) A method according to claim 8, further comprising a pre- 
processing phase, including the steps of said method for optimizing a database of sample 
records, for selecting the most predictive input variables. 

1 1 . (Currently Amended) A method according to claim 1 , 
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in which different choices of the structure of the training subset and the structure of the 
testing subset consist in different selections of the number of input variables of the data records 
of the database, which selections consist in leaving out at least one, preferably two or more 
variables from the entire input variable set forming each record, the records of the database 
comprising a certain number of known input variables and a certain number of known output 
variables, 

and further comprising a pre-processing phase, including the steps of said method for 
optimizing a database of sample records, for selecting the most predictive input variables, 

charactor i sod i n that wherein the database subjected to the a pre-processing phase of 
input variable selection is a training subset and a testing subset processed with said method. 

12. (Currently amended) A method according to claim 1,. charact e r i s e d i n 
tha twherein the complete database the distribution of the records of which has to be optimized 
has data records having a selected number of input variables, the selection being carried out 
with said method, and in which different choices of the structure of the training subset and the 
structure of the testing subset consist in different selections of the number of input variables of 
the data records of the database, which selections consist in leaving out at least one, preferably 
two or more variables from the entire input variable set forming each record, the records of the 
database comprising a certain number of known input variables and a certain number of known 
output variables. 

13. (Currently Amended) A method according to claim \^ charact e r i s e d i n 
tha twherein a preprocessing phase for optimizing the distribution of the records on a training 
subset and a testing subset and for selecting the most predictive input variables, is carried out 
alternatively one to the other several times. 

14. (Currently Amended) A method according to claim \^ charactor i sod i n 
tha twherein the evolutionary algorithm is a genetic algorithm with the following evolutionary 
rules: 

an average health value of the population is computed as a function of the fitness values 
of each single individual in the population; 
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coupling, recombination of genes and mutation of genes are carried out in a 
differentiated manner depending on a comparison between the fitness of each individual of the 
couple and the average health value of the entire population to which the individuals belong; 

individuals having a fitness value lower or equal to the average health of the entire 
population are not excluded from the creation of new generations but are marked out and 
entered in a vulnerability list; and 

the number of subjects entered in the vulnerability list defining the number of possible 
marriages. 

15. (Currently Amended) A method according to claim 14 in which for coupling 
purposes and for generation of children at least one parent individuals must have a fitness value 
greater than the average health value of the population. 

16. (Currently Amended) A method according to claim 14, charact e r i s e d i n 
tha twherein each couple of individuals can generate individuals having a fitness different from 
the average health, so called offsprings if the fitness of one them, at least is greater than the 
average fitness, the offsprings of each marriage occupying the places of subjects entered in the 
vulnerability list and are marked out, so that a weak individual can continue to exist through his 
own children. 

17. (Currently Amended) A method according to claim 14, charact e r i s e d i n 
tha twherein coupling between individuals having a very low fitness value and a very high fitness 
value are not allowed. 

18. (Currently Amended) A method according to claim 14, charactor i sod i n 
tha twherein the following recombination rules of the genes of the parents individuals coupled 
are considered in the case the parents individuals have not common genes: 

the health of father and mother individuals are greater than the average health of the 
entire population; the crossover is a classical crossover according to which the genes of the 
father and of the mother individuals are substituted one with the other starting from a certain 
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crossover point; 

the health of father and mother individuals are lower than the average health of the 
entire population; in this case the two children are formed through rejection of the parents genes 
they will receive by the crossover process; 

the health of one of the parents is less than the average health of the entire population 
while the health of the other parent is greater than the average health of the entire population; in 
this case only the parents whose health is greater than the average health of the entire 
population will transmit their genes, while the genes of the parent having an health lower than 
the average health of the entire population are rejected. 

19. (Currently Amended) A method according to claim 18, wherein each gene is 
characterised by a status level, the method further charact e r i s e d i n that wherein genes rejection 
consists in modifying the status of the genes from one status level to a different status level. 

20. (Currently Amended) A method according to claim 18, charact e r i s e d i n 
tha twherein a modified crossover of the genes of the parents individuals is carried out when the 
parents individuals has part of the genes that coincide, this modified crossover provides for 
generating and offspring in which the genes selected for crossover are the most effective ones 
of the parents. 

21 . (Currently Amended) A method according to claim 14 , wherein i n wh i ch the 
individuals are the different prediction algorithm representing a corresponding different initial 
random distribution of data records onto the testing and the training data set and the genes 
consist in the binary status variable of association of each record to the training and to the 
testing subset. 

22. (Currently Amended) A method according to claim 14 , wherein i n wh i ch the 
individuals are the prediction algorithms each one representing a different training and testing 
data set, the difference residing in a different selection of input variables for each different 
training and testing subset, and the genes consist in the different selection variable which is 
provided for each input variable in the different training and testing subsets, the above 
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mentioned selection variable being a parameter indicating the presence/absence of each 
corresponding input variable in the records of each data set. 

23. (Currently Amended) A method according to claim 1 , wherein the method 
charact e r i z e d i n that i t is in the form of a software program comprising instructions executable 
by a CPU, the software program being stored in a memory to which the CPU can access. 

24. (Currently Amended) A software program stored on a memory device, wherein 
the said-software program consisting in the method according to claim 1 in the form of Nail an 
executable instructions of a CPU or of a computer system. 

25. (Currently Amended) A system for carrying out a method according to claim 
comprising: 

an apparatus or device for generating an action of response which is autonomously, i.e. 
by itself, chosen among a certain number of different kinds of actions of response stored in a 
memory of the apparatus or autonomously generated by the apparatus basing the said-choice 
of the kind of action of response on the interpretation of data collected autonomously by m e ans 
ef-one or more sensors responsive to physical entities or which are fed to the apparatus by 
means of input means, the said interpretation being made by means of a prediction algorithm in 
the form of a software saved in a memory of the said apparatus and being carried out by a 
central processing unit, 

charact e r i z e d i n that wherein the apparatus b ei ng isjurther provided with means for 
carrying out a training and testing phase of the prediction algorithm by inputting to the saM 
prediction algorithm data of a known database in which input variables of the input data 
representing the physical entities able to being sensed by the apparatus through the one or 
more sensors and/or able to be fed to the apparatus by moans of the input means are 
univoquely correlated to at least one definite kind of action of response among the different 
kinds of possible action of response, the said-means for carrying out the training an testing 
being in the form of a training and testing software saved in a memory of the apparatus, the said 
training and testing being carried out by m e ans of a method according to claim 1, the said- 
training and testing software program being the said method of training and testing in the form 
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of a software program or instructions. 

26. (Currently Amended) The system according to claim 25, charact e r i z e d i n that i t 
wherein the system is a system for sound or vocal recognition comprising input means 
responsive to acoustic waves, a processing unit connected to the input means responsive to 
acoustic waves, at least a memory in which a software program is stored the said program 
being in the form according to claims 23 or 24 and comprising coded instructions for enabling 
the processing unit to carry out a method according to claim 1, a further or the same above 
mentioned memory in which a dataset of known data records is stored or can be stored and/or 
input means for storing in the further or the said above mentioned memory a dataset of known 
data records. 

27. (Currently Amended) The system according to claim 25, charact e r i z e d i n that i t 
wherein the system is a system for image recognition, the input means being responsible to 
electromagnetic waves, the system being able to recognize the shape of an object generating or 
reflecting electromagnetic waves, and/or the distance and/or the identity of the object. 

28. (Currently Amended) The system according to claim 26, charact e r i z e d i n that 
wherein the database of known data records comprises acoustic signals emitted by one or more 
objects or one or more living beings making part of the typical environment in which the device 
has to operate or the data relating to one or more images of one or more objects or one or more 
living beings making part of the typical environment in which the device has to operate to which 
are univoquely correlated to corresponding known kind, and/or identity and/or meaning of 
objects to which the said acoustic signals or image data are related and/or from which the said 
acoustic signals or image data are generated. 

29. (Currently Amended) The system according to claim 27, charactor i zod i n that i t 
wherein the system is a specialized system for image pattern recognition having artificial 
intelligence utilities for analyzing a digitalized image, i.e. an image in the form of a array of 
image data records, each image data record being related to a zone or point or unitary area or 
volume of a two or three dimensional visual image , so ca lle d p i x el or vox el of a v i sua l i mag e, the 
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said-visual image being formed by an array of the said pixels or voxels and utilities for indicating 
for each image data record a certain quality among a plurality of known qualities of the image 
data records; 

the system having a processing unit as for examp le a conv e nt i ona l comput e r , a memory 
in which an image pattern recognition algorithm is stored in the form of a software program 
which can be executed by the processing unit; 

a memory in which a certain number of predetermined different qualities which the 
image data records can assume has been stored and which qualities has to be univoquely 
associated to each of the image data records of an image data array fed to the system; 

input means for receiving arrays of digital image data records or input means for 
generating arrays of digital image data records from an existing image and a memory for storing 
the said digital image data array; 

output means for indicating for each image data record of the image data array a certain 
quality chosen by the processing unit in carrying out the image pattern recognition algorithm in 
the form of the said software program; 

the image pattern recognition algorithm is a prediction algorithm in the form of a software 
program, which prediction algorithm is further associated to a system being further provided with 
a training and testing software program; 

the system is able to carry out training and testing according to the method of claim 1 ; 

the method is provided in the system in the form of the training and testing software 
program; and 

a database being also provided in which data records are contained univoquely 
associating known image data records of known image data arrays with the corresponding 
known quality from a certain number of predetermined different qualities which the image data 
records can assume. 

30. (Currently Amended) A method for producing a microarray for genotyping 
operations, the [[said]] method comprising the steps of defining a certain number of theoretically 
relevant genes or alleles or polymorphisms considered relevant for a certain biologic condition 
like a tissue structure, a pathology or the potentiality of developing a pathology or an anatomic 
or morphologic feature , the method comprising : 
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a) providing a database of experimentally determined data in which each record relates 
to a known clinical or experimental case of a sample population of cases and which records 
comprise a certain number of input variables corresponding to the presence/absence of a 
certain predetermined number of polymorphisms and/or mutations and/or equivalent genes of a 
certain number of theoretically probable relevant genes, said certain predetermined number of 
polymorphisms and/or genes forming a set, and one or more related output variables 
corresponding to the certain biological or pathologic condition of the said-clinical and 
experimental cases of the sample population; 

charactor i zod by tho fo ll ow i ng further stops: 

b) determining a selection of a subset of the set of certain predetermined number of 
polymorphisms and/or genes by testing the association of the said-genes or polymorphisms and 
the biological or pathological condition by m e ans of mathematical tools applied to the database; 

c) the said-mathematical tools compris e comprising a so ca lle d prediction algorithm such 
as a so Ga lle d neural network; 

and th e further st e ps ar e carr ie d out of: 

d) dividing the database into a training and a testing dataset for training and testing the 
prediction algorithm; 

e) defining two or more different training datasets each one having records with a set of 
input variables obtained by excluding one or more input variables from the originally defined 
number of input variables, while for each record the set of input variables of the corresponding 
training set has at least one input variable which is not a member of the set of input variables of 
the other training datasets, each said at least one input variable consisting [[in]]_of a different 
gene or a different polymorphism^]] and/or a different mutation and/or a different functionally 
equivalent gene thereof of the originally considered genes or polymorphisms and/or mutations 
and/or functionally equivalent genes thereof considered theoretically potentially relevant for the 
biologic or pathologic condition; 

f) training the prediction algorithm with each of the different training sets defined under 
eem iat step e) for generating a first population of different prediction algorithms which are 
divided into two groups of mother and father prediction algorithms and testing the said- 
prediction algorithms with the associated testing set; 

g) calculating a fitness score or prediction accuracy of each father and mother prediction 
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algorithms of the said first population by m e ans of through the testing results; 

i) providing a so ca lle d an evolutionary algorithm such as a genetic algorithm and 
applying the evolutionary algorithm to the first population of mother and father prediction 
algorithms for achieving new generation of prediction algorithms whose training and testing 
dataset comprises records whose input variables selections are a combination of the input 
variable selections of the records of the training and of the testing datasets of the first or 
previous population of father and mother prediction algorithms according to the rules of the 
evolutionary algorithm; 

j) for each generation of new prediction algorithms representing each new variant 
selection of input variables, the best prediction algorithm according to the best hypothesis of 
input variable selection Msll being tested or validated by m e ans of through the testing dataset; 

k) evaluating a fitness score i s e va l uated and the promoting prediction algorithms 
representing the selections of input variables which have the best testing performance with the 
minimum number of input variables utilized ar e promot e d for the processing of new generations; 

I) repeating toe-steps i) to k) until a predetermined fitness score defined as best fit of the 
prediction algorithm and a minimum number of input variables has been reached: and 

m) defining as the selected relevant input variables i.e. as the relevant genes or 
polymorphisms and/or of mutations and/or of functionally equivalent genes thereof the ones 
related to the input variables of the selection represented by the prediction algorithm having 
both at least the predetermined fitness score and also the minimum number of selected input 
variables. 

31 . (Currently Amended) A method according to claim 30, charact e r i z e d i n that 
wherein an optimization of the distribution of the records of the original database in a training 
dataset and in a testing dataset is carried out in one of a pre processing and a post processing 
phase, i.e. before carrying out the steps e) to m) at step d) or after having carried out the steps 
a) to m). 

32. (Currently Amended) The method according to claim 31 comprising the following 
steps of opt i m i sat i on optimization : 

defining a set of one or more distributions of the database records onto respective 
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training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or more 
prediction algorithms and assigning a fitness score to each; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates a 
set of one or more second generation prediction algorithms and assigns a fitness score to each; 
and 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs; 

whoro wherein said termination event is at least one of a prediction algorithm is 
generated with a fitness score equalling or exceeding a defined minimum value, the maximum 
fitness score of successive generational sets of prediction algorithms converging to a given 
value, and a certain number of generations having been generated. 

33. (Currently Amended) The method according to claim 31, further comprising the 
following steps: 

generating a population of prediction algorithm each one of them is trained and tested 
according to a different distribution of the records of the data set in the complete database onto 
a training data set and a testing data set; 

each different distribution being created by a random distribution or a distribution formed 
by a deterministic mathematical process characterized as a pseudo-random distribution; 

each prediction algorithm of the said population is trained according to its own 
distribution of records of the training set and is validated in a blind way according its own 
distribution on the testing set; 

a score reached by each prediction algorithm is calculated in the testing phase 
representing its fitness; 

an evolutionary algorithm being further provided which combines the different models of 
distribution of the records of the complete data set in a training and in a testing set which sets 
are represented each one by a corresponding prediction algorithm trained and tested on the 
basis of the said training and testing data set according to the fitness score calculated in the 
previous step for the corresponding prediction algorithm; 
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the fitness score of each prediction algorithm corresponding to one of the different 
distributions of the complete data set on the training and the testing data sets being the 
probability of evolution of each prediction algorithm or of each said distribution of the complete 
data set on the training and testing data sets; 

repeating the evolution of the prediction algorithm generation for a finite number of 
generations or till the output of the genetic algorithm converges to a best solution and/or till the 
fitness value of at least some prediction algorithm related to an associated data records 
distribution has reached a desired value; and 

setting the data records distribution for the best solution as the optimized training and 
testing subsets for training and testing prediction algorithm. 

34. (Currently Amended) A microarray for genotyping comprising a reduced number 
of genes, alleles or polymorphisms characterized in that the reduced number of the said genes, 
alleles or polymorphisms has been selected by means of a method according to claim 30. 

35. (Currently Amended) A method for performing a supervised learning process in 
an artificial intelligence environment including optimizing a database of sample records for the 
training and testing of a prediction algorithm for a problem under investigation characterized by 
input variables and output variables, the prediction algorithm used for predicting output variables 
for real world data, the method comprising the steps of: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or more 
prediction algorithms and assigning a fitness score to each, each of said prediction algorithms 
being associated with a certain distribution of said database records; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates a 
set of one or more second generation prediction algorithms and assigns a fitness score to each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs, where said termination event is at least one of 

a prediction algorithm is generated with a fitness score equal to or exceeding a 
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defined minimum value, 

the maximum fitness score of successive generational sets of prediction 
algorithms converging to a given value, and 

a certain number of generations having been generated; 

selecting a prediction algorithm having a best fitness score; 

using the distribution of database records associated with said selected prediction 
algorithm in performing supervised learning, said supervised learning including training and 
testing of prediction algorithms to obtain a trained prediction algorithm; and 

using the trained prediction algorithm to predict the output variables relating to the 
problem under investigation where only the input variables are known, 

wherein said method is performed using a computer and computer software 
forming an intelligent system. 
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